Goto

Collaborating Authors

 Trading


6d0f9c415e2d779c78f32b74668e9d02-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49, 446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant. These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domains. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2.17% in their weighted F1 scores. The models perform significantly better on features with higher subjectivity, such as Specific and Assertive, with a mean difference of 10.01% in their weighted F1 scores.



From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection

Neural Information Processing Systems

This paper introduces a novel approach that leverages Large Language Models (LLMs) and Generative Agents to enhance time series forecasting by reasoning across both text and time series data. With language as a medium, our method adaptively integrates social events into forecasting models, aligning news content with time series fluctuations to provide richer insights. Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning to evaluate predictions. This enables the model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. By integrating selected news events with time series data, we fine-tune a pre-trained LLM to predict sequences of digits in time series. The results demonstrate significant improvements in forecasting accuracy, suggesting a potential paradigm shift in time series forecasting through the effective utilization of unstructured news data.


Large Pre-trained time series models for cross-domain Time series analysis tasks

Neural Information Processing Systems

Large pre-trained models have been vital in recent advancements in domains like language and vision, making model training for individual downstream tasks more efficient and provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a foundational time-series model from multidomain time-series datasets: extracting semantically useful tokenized inputs to the model across heterogenous time-series from different domains. We propose Large Pre-trained Time-series Models (LPTM) that introduces a novel method of adaptive segmentation that automatically identifies optimal dataset-specific segmentation strategy during pre-training. This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings. LPTM achieves superior forecasting and time-series classification results taking up to 40% less data and 50% less training time compared to state-of-art baselines.


CausalStock: Deep End-to-end Causal Discovery for News-driven Stock Movement Prediction Yuxin Lin

Neural Information Processing Systems

There are two issues in news-driven multi-stock movement prediction tasks that are not well solved in the existing works. On the one hand, "relation discovery" is a pivotal part when leveraging the price information of other stocks to achieve accurate stock movement prediction. Given that stock relations are often unidirectional, such as the "supplier-consumer" relationship, causal relations are more appropriate to capture the impact between stocks. On the other hand, there is substantial noise existing in the news data leading to extracting effective information with difficulty. With these two issues in mind, we propose a novel framework called CausalStock for news-driven multi-stock movement prediction, which discovers the temporal causal relations between stocks.


Let AI fix your stock portfolio (and your anxiety)

Mashable

TL;DR: Sterling Stock Picker gives you AI-powered investment advice, a portfolio builder, and plain-English explanations for a lifetime -- now just 55.19 with code SAVE20. I was watching the first of the month's market crashes and wondered, "Does it make sense to invest right now?" I'd always been curious about the stock market, at least in terms of how people actually got rich by practically gambling, but I didn't have a single clue where to begin -- let alone how to even buy a stock. Once I saw a TikTok calling this period a once-in-a-lifetime opportunity (take that with a grain of salt), I decided to take my chance. But I needed help researching everything, like what stocks to choose and how to track them. That's when I found Sterling Stock Picker, and made my first investment.


MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data

Neural Information Processing Systems

Time series forecasting is widely used in business intelligence, e.g., forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.


A Background

Neural Information Processing Systems

In this section, we provide an overview of blockchain technology and cryptocurrency, laying the groundwork for understanding the subsequent discussions in this paper. Blockchain technology has gained growing attention recently for its strong security features and decentralized structure. It is characterized by a sequence of cryptographically secured blocks that operate on a network of nodes [42]. This design ensures data immutability and verifiability while allowing universal access, enabling participants to interact with the ledger from anywhere at any time. Once recorded on the ledger, transactions become irreversible and are executed securely and transparently, which helps safeguard the integrity of data exchanges. With the support of blockchain technology, cryptocurrencies have surged in popularity as an innovative means of conducting secure digital transactions. Unlike traditional currencies, cryptocurrencies operate without a centralized authority and are managed through decentralized systems. This decentralization maintains participant anonymity, offering robust privacy protection; however, it complicates efforts to identify fraudulent activities within the market.


Multi-Chain Graphs of Graphs: A New Approach to Analyzing Blockchain Datasets

Neural Information Processing Systems

Machine learning applied to blockchain graphs offers significant opportunities for enhanced data analysis and applications. However, the potential of this field is constrained by the lack of a large-scale, cross-chain dataset that includes hierarchical graph-level data. To address this issue, we present novel datasets that provide detailed label information at the token level and integrate interactions between tokens across multiple blockchain platforms.